NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Labeled Data Generation with Inexact Supervision

https://doi.org/10.1145/3447548.3467306

Dai, Enyan; Shu, Kai; Sun, Yiwei; Wang, Suhang (August 2021, 2021 ACM SIGKDD International Conference on Knowledge Discovery and Data Mining)
null (Ed.)
Full Text Available
High-throughput microbial culturomics using automation and machine learning

https://doi.org/10.1038/s41587-023-01674-2

Huang, Yiming; Sheth, Ravi U.; Zhao, Shijie; Cohen, Lucas A.; Dabaghi, Kendall; Moody, Thomas; Sun, Yiwei; Ricaurte, Deirdre; Richardson, Miles; Velez-Cortes, Florencia; et al (February 2023, Nature Biotechnology)

Abstract Pure bacterial cultures remain essential for detailed experimental and mechanistic studies in microbiome research, and traditional methods to isolate individual bacteria from complex microbial ecosystems are labor-intensive, difficult-to-scale and lack phenotype–genotype integration. Here we describe an open-source high-throughput robotic strain isolation platform for the rapid generation of isolates on demand. We develop a machine learning approach that leverages colony morphology and genomic data to maximize the diversity of microbes isolated and enable targeted picking of specific genera. Application of this platform on fecal samples from 20 humans yields personalized gut microbiome biobanks totaling 26,997 isolates that represented >80% of all abundant taxa. Spatial analysis on >100,000 visually captured colonies reveals cogrowth patterns betweenRuminococcaceae,Bacteroidaceae,CoriobacteriaceaeandBifidobacteriaceaefamilies that suggest important microbial interactions. Comparative analysis of 1,197 high-quality genomes from these biobanks shows interesting intra- and interpersonal strain evolution, selection and horizontal gene transfer. This culturomics framework should empower new research efforts to systematize the collection and quantitative analysis of imaging-based phenotypes with high-resolution genomics data for many emerging microbiome studies.
more » « less
Explainable Multivariate Time Series Classification: A Deep Neural Network Which Learns to Attend to Important Variables As Well As Time Intervals

https://doi.org/10.1145/3437963.3441815

Hsieh, Tsung-Yu; Wang, Suhang; Sun, Yiwei; Honavar, Vasant (March 2021, The 14th ACM International Conference on Web Search and Data Mining)
null (Ed.)
Full Text Available
SrVARM: State Regularized Vector Autoregressive Model for Joint Learning of Hidden State Transitions and State-Dependent Inter-Variable Dependencies from Multi-variate Time Series

https://doi.org/10.1145/3442381.3450116

Hsieh, Tsung-Yu; Sun, Yiwei; Tang, Xianfeng; Wang, Suhang; Honavar, Vasant G. (April 2021, WWW '21: Proceedings of the Web Conference 2021)
null (Ed.)
Full Text Available
Ginger Cannot Cure Cancer: Battling FakeHealth News with a Comprehensive Data Repository

Dai, Enyan; Sun, Yiwei; Wang, Suhang (May 2020, Proceedings of the International AAAI Conference on Weblogs and Social Media)

Nowadays, Internet is a primary source of attaining health in-formation. Massive fake health news which is spreading overthe Internet, has become a severe threat to public health. Nu-merous studies and research works have been done in fakenews detection domain, however, few of them are designedto cope with the challenges in health news. For instance, thedevelopment of explainable is required for fake health newsdetection. To mitigate these problems, we construct a com-prehensive repository, FakeHealth, which includes news con-tents with rich features, news reviews with detailed expla-nations, social engagements and a user-user social network.Moreover, exploratory analyses are conducted to understandthe characteristics of the datasets, analyze useful patterns andvalidate the quality of the datasets for health fake news detec-tion. We also discuss the novel and potential future researchdirections for the health fake news detection.
more » « less
Full Text Available
LMLFM: Longitudinal Multi-Level Factorization Machine

https://doi.org/10.1609/aaai.v34i04.5916

Liang, Junjie; Xu, Dongkuan; Sun, Yiwei; Honavar, Vasant (June 2020, Proceedings of the AAAI Conference on Artificial Intelligence)
null (Ed.)
We consider the problem of learning predictive models from longitudinal data, consisting of irregularly repeated, sparse observations from a set of individuals over time. Such data often exhibit longitudinal correlation (LC) (correlations among observations for each individual over time), cluster correlation (CC) (correlations among individuals that have similar characteristics), or both. These correlations are often accounted for using mixed effects models that include fixed effects and random effects, where the fixed effects capture the regression parameters that are shared by all individuals, whereas random effects capture those parameters that vary across individuals. However, the current state-of-the-art methods are unable to select the most predictive fixed effects and random effects from a large number of variables, while accounting for complex correlation structure in the data and non-linear interactions among the variables. We propose Longitudinal Multi-Level Factorization Machine (LMLFM), to the best of our knowledge, the first model to address these challenges in learning predictive models from longitudinal data. We establish the convergence properties, and analyze the computational complexity, of LMLFM. We present results of experiments with both simulated and real-world longitudinal data which show that LMLFM outperforms the state-of-the-art methods in terms of predictive accuracy, variable selection ability, and scalability to data with large number of variables. The code and supplemental material is available at https://github.com/junjieliang672/LMLFM.
more » « less
Full Text Available
Investigating and Mitigating Degree-Related Biases in Graph Convoltuional Networks

https://doi.org/10.1145/3340531.3411872

Tang, Xianfeng; Yao, Huaxiu; Sun, Yiwei; Wang, Yiqi; Tang, Jiliang; Aggarwal, Charu; Mitra, Prasenjit; Wang, Suhang (October 2020, The 29th ACM International Conference on Information & Knowledge Management)
null (Ed.)
Full Text Available
Sulfur-Rich Graphene Nanoboxes with Ultra-High Potassiation Capacity at Fast Charge: Storage Mechanisms and Device Performance

https://doi.org/10.1021/acsnano.0c09290

Sun, Yiwei; Wang, Huanlei; Wei, Wenrui; Zheng, Yulong; Tao, Lin; Wang, Yixian; Huang, Minghua; Shi, Jing; Shi, Zhi-Cheng; Mitlin, David (January 2021, ACS Nano)
null (Ed.)
Full Text Available
LMLFM: Longitudinal Multi-Level Factorization Machine

Liang, Junjie; Xu, Dongkuan; Sun, Yiwei; Honavar, Vasant G. (January 2020, Proceedings of the 34th AAAI Conference on Artficial Intelligence)

We consider the problem of learning predictive models from longitudinal data, consisting of irregularly repeated, sparse observations from a set of individuals over time. Such data of- ten exhibit longitudinal correlation (LC) (correlations among observations for each individual over time), cluster correlation (CC) (correlations among individuals that have similar char- acteristics), or both. These correlations are often accounted for using mixed effects models that include fixed effects and random effects, where the fixed effects capture the regression parameters that are shared by all individuals, whereas random effects capture those parameters that vary across individuals. However, the current state-of-the-art methods are unable to se- lect the most predictive fixed effects and random effects from a large number of variables, while accounting for complex cor- relation structure in the data and non-linear interactions among the variables. We propose Longitudinal Multi-Level Factoriza- tion Machine (LMLFM), to the best of our knowledge, the first model to address these challenges in learning predictive mod- els from longitudinal data. We establish the convergence prop- erties, and analyze the computational complexity, of LMLFM. We present results of experiments with both simulated and real-world longitudinal data which show that LMLFM out- performs the state-of-the-art methods in terms of predictive accuracy, variable selection ability, and scalability to data with large number of variables. The code and supplemental material is available at https://github.com/junjieliang672/LMLFM.
more » « less
Full Text Available
Adaptive Structural Co-regularization for Unsupervised Multi-view Feature Selection

https://doi.org/10.1109/ICBK.2019.00020

Hsieh, Tsung-Yu; Sun, Yiwei; Wang, Suhang; Honavar, Vasant (November 2019, 2019 IEEE International Conference on Big Knowledge)

Full Text Available

« Prev Next »

Search for: All records